Figure 1: Residential Features: (a) Census Unit Scales (b) Urban Context Typology (Marcos et al., 2015)
Figure 2: School locations (kernel density estimation)
| Family educational level | Residential | School |
|---|---|---|
| Primary (PRI) | 273 811 (24%) | 9 332 (26%) |
| Secondary (SEC) | 452 037 (39%) | 11 796 (33%) |
| Higher education (SUP) | 423 876 (37%) | 15 158 (42%) |
| Urban context | Simple | Model | Ratio (Model / Simple) |
|---|---|---|---|
| Colonial Historic City | 436 | 453 | 1.04 |
| Central Business District | 551 | 618 | 1.12 |
| High-Class Residential | 405 | 494 | 1.22 |
| Middle-Class Residential | 500 | 656 | 1.31 |
| Low-Class Residential | 509 | 723 | 1.42 |
| Social housing | 650 | 1017 | 1.56 |
| Urban Informal Neighborhoods | 730 | 1719 | 2.35 |
| Total mean | 503 | 738 | 1.47 |
| Segregation Index | Residential Segregation | School Segregation | Model Segregation | % Explained |
|---|---|---|---|---|
| Dissimilarity Primary and Higher education (\(D_{pri;hig}\)) | 0.37 | 0.62 | 0.40 | 64.2% |
| Multi-group dissimilarity (\(D^{*}\)) | 0.20 | 0.39 | 0.21 | 54.0% |
| Mutual Information Index (\(M\)) | 0.07 | 0.20 | 0.08 | 41.3% |
| Segregation: Primary (\(IS_{pri}\)) | 0.29 | 0.47 | 0.30 | 63.2% |
| Segregation: Secondary (\(IS_{sec}\)) | 0.09 | 0.22 | 0.07 | 32.0% |
| Segregation: Higher education (\(IS_{sup}\)) | 0.26 | 0.48 | 0.28 | 58.4% |
| Multi-group Gini Multi-group (\(G^{*}\)) | 0.29 | 0.52 | 0.30 | 57.5% |
| Normalized Isolation: Primary (\(ETA_{pri}^2\)) | 0.11 | 0.23 | 0.12 | 52.6% |
| Normalized Isolation: Secondary (\(ETA_{sec}^2\)) | 0.01 | 0.06 | 0.01 | 11.8% |
| Normalized Isolation: Higher education (\(ETA_{sup}^2\)) | 0.10 | 0.30 | 0.11 | 36.1% |
| Multi-group normalized exposure (\(P^{*}\)) | 0.07 | 0.20 | 0.07 | 35.6% |
Figure 3: Segregation index by Census Unit Scales (Radio, Fracción, Barrio, and Comuna)
The detailed R code is available on the author’s Github repository.
For each of the J census units, it is specified:
demand vector, where each element of the vector represents the amount of the ‘demand population’ in the census unit j.J x 3 (number of education categories), where each row sums to 1.This information is assigned to the centroid of each census unit, as it is considered to represent the average distance travelled by each potential student.
We identify the number of available places for each school using the number of students enrolled in the previous year as a proxy. The supply vector is created, where each element represents the maximum number of students that each school k can accommodate.
A cost matrix C is generated using distance of the street network between the centroid of each census unit and each school. For each school/census unit, the straight-line distance to the nearest street is calculated, and a new node is created in the network (if necessary). Then, we compute the street network distance between these two points of the network. We add (a) the two straight-line distances and (b) the in-network distances; to get the total distance between the census unit and the school. The resulting matrix C has a dimension of J x K, where each row represents one of the J census units and each column represents one of the K schools. The element \(c_{jk}\), belonging to matrix C, indicates the distance between the centroid of census unit j and school k.
This is the central step of the proposed method. Our goal is to find the allocation matrix A (with dimension J x K), where each element \(a_{jk}\) represents the number of students from census unit j that should attend school k in order to minimise the total distance travelled. To solve this optimization problem, we use an integer linear programming method known as transportation problems, which allows us to minimize an ‘objective function’ subject to certain ‘constraints’.
A. Define Objective Function
We define the objective function as Equation (1). Since matrix C represents the distance between census units and schools, Equation (1) minimizes the total distance travelled by all students, assuming that they are assigned according to matrix A.
\[ \min_{a \in \mathbb Z_{\geq 0}}( \sum_j^J \sum_k^K a_{jk}.c{jk} ) \quad \quad \textrm{(1)} \]
B. Define Constraints
We establish three constraints:
A is always an integer.A must be equal to the population demand of each census unit (\(\sum_k^K a_{jk} =demand_{jk}\); for each of the J census units).k must not exceed its maximum student capacity. In other words, the sum of each column in matrix A must be less than or equal to capacity of school k (\(\sum_j^J a_{jk} \leq supply_{jk}\); for each of the K schools).C. Optimisation
To solve the optimisation problem of the objective function (1), we use the R package ‘lpSolve’ (Berkelaar, 2019). This package provides an interface between R and ‘lp_solve’ (Berkelaar et al., 2004), a Mixed Integer Linear Programming solver written in ANSI C. The result is the allocation matrix A that minimises the objective function while adhering to the constraints mentioned above.
Based on the allocation matrix A that relates census units and schools, it is possible to assign to each school a number of students proportional to the educational composition of each census unit. To achieve this, we use information on the educational profile of households in each census unit. For example, if matrix A assigns 20 students from census unit ‘Census01’ to school ‘School01’, and the educational profile of the households in that census unit is 0.5/0.3/0.2 (PRI, SEC, SUP, respectively), then ‘School01’ will be assigned 10 students with PRI background, 6 students with SEC background, and 4 students with SUP background. By repeating this process for all census units with students assigned to each school, we can obtain the expected educational background profile for each school based on the model. The result is a matrix of K x 3 (one column for each educational level), indicating the number of students assigned by the model to each school k for each educational category. This ‘modelled’ composition can be compared to the ‘real’ composition using segregation indices.